SPARQL Aggregation functions shouldn't build up memory for each row #678

tiko-tiko · 2017-01-05T14:29:33Z

TL; DR: I'd like to discuss:

style guide
missing comments
additional tests to conform to
implementation/buffer size of REDUCED
Feel free to comment anything you like

Some context:
I have been experimenting with a few SPARQL queries which should do a lot of counting.
The idea was to find correlations among different predicates in order to find the most promising queries on a data set. One query yielded a few million combinations which caused rdflib even without the DISTINCT keyword to build up enough memory consumption to start swapping.

The cause seemed to be evalGroup() which appended each incoming row to a list. I thought of two different approaches to avoid this, an OO style approach and using python's generators with their send() method. Both seemed to have similar memory requirements so I decided for the OO style approach for better readability.

I tried to get some response on the IRC channel to ask for style guides and such but during the festive season there was little activity. So I used the style guide I am used to. Please point out what to change to conform to your style guide.

After the refactoring I fixed the code to pass all the tests again. Are there additional requirements which are not implemented via the test suite?

Also I tried to implement a generic algorithm for the REDUCED keyword although it is not directly related to my original problem.

coveralls · 2017-01-05T14:42:53Z

Coverage increased (+0.07%) to 62.909% when pulling 636f685 on tiko-tiko:sparql-aggregates into 2a869ca on RDFLib:master.

gromgull · 2017-01-11T20:25:34Z

Hi @tiko-tiko !

This looks very interesting! And well done on making sense of the SPARQL Engine code! Although I wrote a lot of it in the first place, whenever I look at it now I am often confused.

For style we're not very strict, mainly pep8
If the tests pass, this is good enough!
What comments are you missing? I mean, there are MANY comments missing - but anything in particular?

tiko-tiko · 2017-01-11T21:36:35Z

Hi @gromgull !

thank you for your reply. Regarding the comments for example I know each class, method and function could have a docstring, On the other hand given the OO nature of my code this could be a little repetitive. I am looking for parts of my patch which need more explanation but my view on this matter is rather biased at the moment. I understand that eventually I will wish I would have made more comments.
Could please tell me which parts seem unclear to you, so I can clarify?

Thanks, tiko-tiko

gromgull · 2017-01-12T09:47:40Z

Then I misunderstood - I though you asked about more comments in existing code. Your code is fine, I think comments are just waiting to be made wrong and outdated by changes to code anyway :)

tiko-tiko added 3 commits January 5, 2017 14:13

Implement "on the fly" aggregation

2a4b9ba

Implement "on the fly" aggregation

46eeb06

Implement variable MRU style REDUCED

636f685

gromgull merged commit 941b739 into RDFLib:master Jan 12, 2017

tiko-tiko deleted the sparql-aggregates branch January 14, 2017 11:30

gromgull mentioned this pull request Jan 23, 2017

SPARQL: COUNT DISTINCT not working properly #404

Closed

joernhees added enhancement New feature or request performance SPARQL labels Jan 25, 2017

joernhees added this to the rdflib 4.2.2 milestone Jan 25, 2017

pyup-bot mentioned this pull request Jan 29, 2017

Update rdflib to 4.2.2 mytardis/mytardis#815

Merged

This was referenced Mar 16, 2017

Initial Update mozilla/amo-validator#510

Closed

Update rdflib to 4.2.2 mozilla/amo-validator#515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARQL Aggregation functions shouldn't build up memory for each row #678

SPARQL Aggregation functions shouldn't build up memory for each row #678

tiko-tiko commented Jan 5, 2017

coveralls commented Jan 5, 2017 •

edited

Loading

gromgull commented Jan 11, 2017

tiko-tiko commented Jan 11, 2017

gromgull commented Jan 12, 2017

SPARQL Aggregation functions shouldn't build up memory for each row #678

SPARQL Aggregation functions shouldn't build up memory for each row #678

Conversation

tiko-tiko commented Jan 5, 2017

coveralls commented Jan 5, 2017 • edited Loading

gromgull commented Jan 11, 2017

tiko-tiko commented Jan 11, 2017

gromgull commented Jan 12, 2017

coveralls commented Jan 5, 2017 •

edited

Loading